21 research outputs found

    Are there any ‘object detectors’ in the hidden layers of CNNs trained to identify objects or scenes?

    Get PDF
    Various methods of measuring unit selectivity have been developed with the aim of better understanding how neural networks work. But the different measures provide divergent estimates of selectivity, and this has led to different conclusions regarding the conditions in which selective object representations are learned and the functional relevance of these representations. In an attempt to better characterize object selectivity, we undertake a comparison of various selectivity measures on a large set of units in AlexNet, including localist selectivity, precision, class-conditional mean activity selectivity (CCMAS), network dissection,the human interpretation of activation maximization (AM) images, and standard signal-detection measures. We find that the different measures provide different estimates of object selectivity, with precision and CCMAS measures providing misleadingly high estimates. Indeed, the most selective units had a poor hit-rate or a high false-alarm rate (or both) in object classification, making them poor object detectors. We fail to find any units that are even remotely as selective as the 'grandmother cell' units reported in recurrent neural networks. In order to generalize these results, we compared selectivity measures on units in VGG-16 and GoogLeNet trained on the ImageNet or Places-365 datasets that have been described as 'object detectors'. Again, we find poor hit-rates and high false-alarm rates for object classification. We conclude that signal-detection measures provide a better assessment of single-unit selectivity compared to common alternative approaches, and that deep convolutional networks of image classification do not learn object detectors in their hidden layers.Comment: Published in Vision Research 2020, 19 pages, 8 figure

    Does that sound right? A novel method of evaluating models of reading aloud

    Get PDF
    Nonword pronunciation is a critical challenge for models of reading aloud but little attention has been given to identifying the best method for assessing model predictions. The most typical approach involves comparing the model’s pronunciations of nonwords to pronunciations of the same nonwords by human participants and deeming the model’s output correct if it matches with any transcription of the human pronunciations. The present paper introduces a new ratings-based method, in which participants are shown printed nonwords and asked to rate the plausibility of the provided pronunciations, generated here by a speech synthesiser. We demonstrate this method with reference to a previously published database of 915 disyllabic nonwords (Mousikou et al., 2017). We evaluated two well-known psychological models, RC00 and CDP++, as well as an additional grapheme-to-phoneme algorithm known as Sequitur, and compared our model assessment with the corpus-based method adopted by Mousikou et al. We find that the ratings method: a) is much easier to implement than a corpus-based method, b) has a high hit rate and low false-alarm rate in assessing nonword reading accuracy, and c) provided a similar outcome as the corpus-based method in its assessment of RC00 and CDP++. However, the two methods differed in their evaluation of Sequitur, which performed much better under the ratings method. Indeed, our evaluation of Sequitur revealed that the corpus-based method introduced a number of false positives and more often, false negatives. Implications of these findings are discussed

    Frequency Sensitivity of Neural Responses to English Verb Argument Structure Violations

    Get PDF
    How are verb-argument structure preferences acquired? Children typically receive very little negative evidence, raising the question of how they come to understand the restrictions on grammatical constructions. Statistical learning theories propose stochastic patterns in the input contain sufficient clues. For example, if a verb is very common, but never observed in transitive constructions, this would indicate that transitive usage of that verb is illegal. Ambridge et al. (2008) have shown that in offline grammaticality judgements of intransitive verbs used in transitive constructions, low-frequency verbs elicit higher acceptability ratings than high-frequency verbs, as predicted if relative frequency is a cue during statistical learning. Here, we investigate if the same pattern also emerges in on-line processing of English sentences. EEG was recorded while healthy adults listened to sentences featuring transitive uses of semantically matched verb pairs of differing frequencies. We replicate the finding of higher acceptabilities of transitive uses of low- vs. high-frequency intransitive verbs. Event-Related Potentials indicate a similar result: early electrophysiological signals distinguish between misuse of high- vs low-frequency verbs. This indicates online processing shows a similar sensitivity to frequency as off-line judgements, consistent with a parser that reflects an original acquisition of grammatical constructions via statistical cues. However, the nature of the observed neural responses was not of the expected, or an easily interpretable, form, motivating further work into neural correlates of online processing of syntactic constructions

    The human visual system and CNNs can both support robust online translation tolerance following extreme displacements

    Get PDF
    Visual translation tolerance refers to our capacity to recognize objects over a wide range of different retinal locations. Although translation is perhaps the simplest spatial transform that the visual system needs to cope with, the extent to which the human visual system can identify objects at previously unseen locations is unclear, with some studies reporting near complete invariance over 10 degrees and other reporting zero invariance at 4 degrees of visual angle. Similarly, there is confusion regarding the extent of translation tolerance in computational models of vision, as well as the degree of match between human and model performance. Here, we report a series of eye-tracking studies (total N = 70) demonstrating that novel objects trained at one retinal location can be recognized at high accuracy rates following translations up to 18 degrees. We also show that standard deep convolutional neural networks (DCNNs) support our findings when pretrained to classify another set of stimuli across a range of locations, or when a global average pooling (GAP) layer is added to produce larger receptive fields. Our findings provide a strong constraint for theories of human vision and help explain inconsistent findings previously reported with convolutional neural networks (CNNs)

    Clarifying status of DNNs as models of human vision

    Get PDF
    On several key issues we agree with the commentators. Perhaps most importantly, everyone seems to agree that psychology has an important role to play in building better models of human vision, and (most) everyone agrees (including us) that deep neural networks (DNNs) will play an important role in modelling human vision going forward. But there are also disagreements about what models are for, how DNN-human correspondences should be evaluated, the value of alternative modelling approaches, and impact of marketing hype in the literature. In our view, these latter issues are contributing to many unjustified claims regarding DNN-human correspondences in vision and other domains of cognition. We explore all these issues in this response

    Children Use Statistics and Semantics in the Retreat from Overgeneralization

    Get PDF
    How do children learn to restrict their productivity and avoid ungrammatical utterances? The present study addresses this question by examining why some verbs are used with un- prefixation (e.g., unwrap) and others are not (e.g., *unsqueeze). Experiment 1 used a priming methodology to examine children's (3–4; 5–6) grammatical restrictions on verbal un- prefixation. To elicit production of un-prefixed verbs, test trials were preceded by a prime sentence, which described reversal actions with grammatical un- prefixed verbs (e.g., Marge folded her arms and then she unfolded them). Children then completed target sentences by describing cartoon reversal actions corresponding to (potentially) un- prefixed verbs. The younger age-group's production probability of verbs in un- form was negatively related to the frequency of the target verb in bare form (e.g., squeez/e/ed/es/ing), while the production probability of verbs in un- form for both age groups was negatively predicted by the frequency of synonyms to a verb's un- form (e.g., release/*unsqueeze). In Experiment 2, the same children rated the grammaticality of all verbs in un- form. The older age-group's grammaticality judgments were (a) positively predicted by the extent to which each verb was semantically consistent with a semantic “cryptotype” of meanings - where “cryptotype” refers to a covert category of overlapping, probabilistic meanings that are difficult to access - hypothesised to be shared by verbs which take un-, and (b) negatively predicted by the frequency of synonyms to a verb's un- form. Taken together, these experiments demonstrate that children as young as 4;0 employ pre-emption and entrenchment to restrict generalizations, and that use of a semantic cryptotype to guide judgments of overgeneralizations is also evident by age 6;0. Thus, even early developmental accounts of children's restriction of productivity must encompass a mechanism in which a verb's semantic and statistical properties interact

    Quantifying sources of variability in infancy research using the infant-directed-speech preference

    Get PDF
    Psychological scientists have become increasingly concerned with issues related to methodology and replicability, and infancy researchers in particular face specific challenges related to replicability: For example, high-powered studies are difficult to conduct, testing conditions vary across labs, and different labs have access to different infant populations. Addressing these concerns, we report on a large-scale, multisite study aimed at (a) assessing the overall replicability of a single theoretically important phenomenon and (b) examining methodological, cultural, and developmental moderators. We focus on infants’ preference for infant-directed speech (IDS) over adult-directed speech (ADS). Stimuli of mothers speaking to their infants and to an adult in North American English were created using seminaturalistic laboratory-based audio recordings. Infants’ relative preference for IDS and ADS was assessed across 67 laboratories in North America, Europe, Australia, and Asia using the three common methods for measuring infants’ discrimination (head-turn preference, central fixation, and eye tracking). The overall meta-analytic effect size (Cohen’s d) was 0.35, 95% confidence interval = [0.29, 0.42], which was reliably above zero but smaller than the meta-analytic mean computed from previous literature (0.67). The IDS preference was significantly stronger in older children, in those children for whom the stimuli matched their native language and dialect, and in data from labs using the head-turn preference procedure. Together, these findings replicate the IDS preference but suggest that its magnitude is modulated by development, native-language experience, and testing procedure. (This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie SkƂodowska-Curie grant agreement No 798658.
    corecore